NSF PAR Search | NSF Public Access Repository

In traditional reinforcement learning (RL), the learner aims to solve a single objective optimization problem: find the policy that maximizes expected reward. However, in many real-world settings, it is important to optimize over multiple objectives simultaneously. For example, when we are interested in fairness, states might have feature annotations corresponding to multiple (intersecting) demographic groups to whom reward accrues, and our goal might be to maximize the reward of the group receiving the minimal reward. In this work, we consider a multi-objective optimization problem in which each objective is defined by a state-based reweighting of a single scalar reward function. This generalizes the problem of maximizing the reward of the minimum reward group. We provide oracle-efficient algorithms to solve these multi-objective RL problems even when the number of objectives is exponentially large-for tabular MDPs, as well as for large MDPs when the group functions have additional structure. Finally, we experimentally validate our theoretical results and demonstrate applications on a preferential attachment graph MDP.

A Note on the Instantiability of the Quantum Random Oracle

https://doi.org/10.1007/978-3-030-44223-1_27

Eaton, E; Song, F. (January 2020, Post-Quantum Cryptography. PQCrypto 2020. Lecture Notes in Computer Science)

In a highly influential paper from fifteen years ago [10], Canetti, Goldreich, and Halevi showed a fundamental separation between the Random Oracle Model (ROM) and the standard model. They constructed a signature scheme which can be proven secure in the ROM, but is insecure when instantiated with any hash function (and thus insecure in the standard model). In 2011, Boneh et al. defined the notion of the Quantum Random Oracle Model (QROM), where queries to the random oracle may be made in quantum superposition. Because the QROM generalizes the ROM, a proof of security in the QROM is stronger than one in the ROM. This leaves open the possibility that security in the QROM could imply security in the standard model. In this work, we show that this is not the case, and that security in the QROM cannot imply standard-model security. We do this by showing that the original schemes that show a separation between the standard model and the ROM are also secure in the QROM. We consider two schemes that establish such a separation, one with length-restricted messages, and one without, and show both to be secure in the QROM. Our results give further understanding to the landscape of proofs in the ROM versus the QROM or standard model, and point towards the QROM and ROM being much closer to each other than either is to standard model security.

Full Text Available

Search for: All records